WHAT IS CLAIMED IS: 



1 1 . An apparatus for direct annotation of objects, the apparatus comprising: 

2 a display device for displaying one or more images; 

3 an audio input device for receiving an audio input; and 

4 a direct annotation creation module coupled to the audio input device and the 

5 display device, the direct annotation creation module creating an 

6 annotation object that associates an input audio signal an image displayed 

7 on the display device. 

1 2. The apparatus of claim 1 further comprising an annotation display module 

2 coupled to the direct annotation creation module, the annotation display module 

3 generating symbols or text representing the annotation objects. 

1 3. The apparatus of claim 1 further comprising an annotation audio output 

2 module coupled to the direct annotation creation module, the annotation audio output 

3 module generating audio output in response to user selection of an annotation symbol 

4 representing an annotation object. 

1 4. The apparatus of claim 1 further comprising: 

2 an audio vocabulary storage for storing a plurality of audio signals and 

3 corresponding text strings; 
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an audio vocabulary comparison module coupled to the audio input device, the 
audio vocabulary storage and the direct annotation creation module, the 
audio vocabulary comparison module receiving audio input and finding a 
corresponding text string that matches the audio input; and 

wherein the direct annotation creation module uses text strings found by the audio 
vocabulary comparison module to create the audio annotation. 

5. The apparatus of claim 1 further comprising: 

an audio vocabulary storage for storing a plurality of audio signals and 
corresponding text strings; 

a dynamic vocabulary updating module coupled to the audio vocabulary storage 
and the audio input device, the dynamic vocabulary updating module for 
displaying an interface to create a new entry in the audio vocabulary 
storage, the dynamic vocabulary updating module receiving an audio input 
and a text string and creating the new entry in the audio vocabulary 
storage. 

6. The apparatus of claim 1 further comprising a media object cache for storing 
media and annotation objects. 

7. An apparatus for direct annotation of objects, the apparatus comprising: 

a direct annotation creation module coupled to receive an input audio signal and a 
reference to an image, the direct annotation creation module creating an annotation object 
that associates a symbol or text with the image; and 

33- 

20412/06364/DOCS/1220656.3 



5 
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an annotation display module coupled to the direct annotation creation module, 
the annotation display module generating the symbol or text representing the annotation 
object on a display device. 



1 8. An apparatus for direct annotation of objects, the apparatus comprising: 

2 a direct annotation creation module coupled to receive an input audio signal and a 

3 reference to an image, the direct annotation creation module creating an annotation object 

4 that associates the input audio signal and the image; and 

5 an annotation audio output module coupled to the direct annotation creation 

[J ; 6 module, the annotation audio output module generating audio output in response to user 

y 7 selection of an annotation symbol representing the annotation object. 

3 3 s ? 

^: i 9. An apparatus for direct annotation of objects, the apparatus comprise 

2 a media object storage for storing media and annotation objects; and 

q 3 a direct annotation creation module coupled to receive an input audio signal and a 

O 4 reference to an image, the direct annotation creation module creating an annotation object 

5 that associates the input audio signal and the image, the direct annotation creation module 

6 storing the audio annotation in the media object storage. 

1 10. A method for direct annotation of objects, the method comprising the steps 

2 of: 

3 displaying an image; 

4 receiving audio input; 

5 detecting selection of an image; and 
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creating an annotation between the selected image and the audio input. 



1 11. The method of claim 10, wherein the step of displaying is performed before 

2 or simultaneously with the step of receiving. 

1 12. The method of claim 10, wherein the step of receiving is performed before or 

2 simultaneously with the step of displaying. 

1 13. The method of claim 10, wherein the step of detecting selection includes 

2 detecting a portion of the image; and wherein the annotation creates an association 

3 between the portion of the image and the audio input. 

1 14. The method of claim 10, further comprising the step of displaying a visual 

2 notation that the image has an annotation. 

1 15. The method of claim 14, wherein the visual notation is text or a symbol. 

1 16. The method of claim 10, wherein the step of creating an annotation includes 

2 creating an annotation object and storing the annotation object in an object storage. 

1 17. The method of claim 10, further comprising the step of recording the audio 

2 input received. 
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18. The method of claim 17, wherein the step of creating an annotation includes 
creating an annotation object and storing the recorded audio input as part of the 
annotation object. 

19. The method of claim 10, further comprising the step of comparing the audio 
input to a vocabulary to produce text. 

20. The method of claim 19, wherein the step of creating an annotation includes 
creating an annotation object and storing the text as part of the annotation object. 

21. The method of claim 10, further comprising the steps of: 
comparing the audio input to a vocabulary; 

determining if the audio input has a matching entry in the vocabulary; and 
storing the entry as part of the annotation object if the audio input has a matching 
entry in the vocabulary. 

22. The method of claim 21 , further comprising the steps of: 
determining if the audio input has a close match in the vocabulary; 
displaying the close matches; 

receiving input selecting a close match; and 

storing the selected close match as part of the annotation object if the audio input 
has a close match in the vocabulary. 
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1 23. The method of claim 22, further comprising the step of displaying a message 

2 that the image has not been annotated if there is neither a matching entry in the 

3 vocabulary nor a close match in the vocabulary. 

1 24. The method of claim 22, further comprising the following steps if there is 

2 neither a matching entry in the vocabulary nor a close match in the vocabulary: 

3 receiving text input corresponding to the audio input; 

4 updating the vocabulary with a new entry including the audio input and the text 

SMS 

if 5 input; and 

jji 6 wherein the received text is stored as part of the annotation object. 

\\ 

in 

,J 1 25. The method of claim 10, further comprising the steps of: 

jaw. 

y s 2 receiving text input corresponding to the audio input; 

3 updating the vocabulary with a new entry including the audio input and the text 

FU 4 input. 

5 26. A method for direct annotation of objects, the method comprising the steps 

6 of: 

7 displaying an image; 

8 receiving audio input; 

9 detecting selection of an image; 

10 comparing the audio input to a vocabulary to produce text; and 

1 1 creating an annotation between the selected image and the text. 
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27. The method of claim 26, further comprising the step of recording the audio 
input received. 

28. The method of claim 27, wherein the step of creating an annotation includes 
creating an annotation object including a reference to the selected image, the recorded 
audio input and the text, and storing the annotation object in an object storage. 

29. The method of claim 26, wherein the step of creating an annotation includes 
creating an annotation object and storing the text as part of the annotation object. 

30. The method of claim 26, further comprising the steps of: 
determining if the audio input has a matching entry in the vocabulary; and 
storing the entry as part of the annotation object if the audio input has a matching 

entry in the vocabulary. 

30. The method of claim 29, further comprising the steps of: 
determining if the audio input has a close match in the vocabulary; 
displaying the close matches; 
receiving input selecting a close match; and 

storing the selected close match as part of the annotation object if the audio input 
has a close match in the vocabulary. 
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3 1 . The method of claim 30, further comprising the step of displaying a message 
that the image has not been annotated if there is neither a matching entry in the 
vocabulary nor a close match in the vocabulary. 

32. The method of claim 30, further comprising the following steps if there is 
neither a matching entry in the vocabulary nor a close match in the vocabulary: 

receiving text input corresponding to the audio input; 

updating the vocabulary with a new entry including the audio input and the text 
input; and 

wherein the received text is stored as part of the annotation object. 

33. The method of claim 26, further comprising the steps of: 
receiving text input corresponding to the audio input; 

updating the vocabulary with a new entry including the audio input and the text 

input. 

34. A method for displaying objects with annotations, the method comprising the 
steps of: 

retrieving an image; 

displaying the image with a visual notation that an annotation exist; 

receiving user selection of an image; and 

outputting a notation associated with the selected image. 
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35. The method of claim 34, wherein the annotation is text and the step of 
outputting is displaying the text proximate an image that it annotates. 

36. The method of claim 34, wherein the annotation is an audio signal and the 
step of outputting is playing the audio signal. 

37. The method of claim 34, further comprising the steps of: 
determining whether the annotation includes text; 
retrieving a text annotation for the selected image; and 
displaying the retrieved text with the image. 

38. The method of claim 34, further comprising the steps of: 
determining whether the annotation includes an audio signal; 
retrieving a audio signal for the selected image; and 
wherein the step of outputting is playing the audio signal. 

39. A method for retrieving images, the method comprising the steps of: 
receiving audio input; 

determining annotation objects that reference a close match to the audio input; 
retrieving the images that are referenced by the determined annotation objects; 
and 

displaying the retrieved images. 
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1 40. The method of claim 39, wherein the step of deteraiining annotation objects 

2 further comprising the steps of: 

3 comparing the audio input to an audio signal reference by an annotation object; 

4 and 

5 determining a close match between the audio input to the audio signal reference 

6 by an annotation object if a probability metric is greater than an a 

7 threshold of 80%. 

H ' i 41. The method of claim 39, wherein the step of determining annotation objects 

2 further comprising the steps of: 

3 determining the annotation objects for a plurality of images; 

4 for each annotation object, comparing the audio input to an audio signal reference 

5 by an annotation object; and 

6 determining a close match between the audio input to the audio signal reference 

7 by an annotation object if a probability metric is greater than an a 

8 threshold of 80%. 
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